AITopics | training site

Collaborating Authors

training site

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities

Zhuang, Tianyi, Kuang, Chuqiao, Li, Xiaoguang, Teng, Yihua, Wu, Jihao, Wang, Yasheng, Shang, Lifeng

arXiv.org Artificial IntelligenceFeb-24-2025

We present DocPuzzle, a rigorously constructed benchmark for evaluating long-context reasoning capabilities in large language models (LLMs). This benchmark comprises 100 expert-level QA problems requiring multi-step reasoning over long real-world documents. To ensure the task quality and complexity, we implement a human-AI collaborative annotation-validation pipeline. DocPuzzle introduces an innovative evaluation framework that mitigates guessing bias through checklist-guided process analysis, establishing new standards for assessing reasoning capacities in LLMs. Our evaluation results show that: 1)Advanced slow-thinking reasoning models like o1-preview(69.7%) and DeepSeek-R1(66.3%) significantly outperform best general instruct models like Claude 3.5 Sonnet(57.7%); 2)Distilled reasoning models like DeepSeek-R1-Distill-Qwen-32B(41.3%) falls far behind the teacher model, suggesting challenges to maintain the generalization of reasoning capabilities relying solely on distillation.

benchmark, checklist, reasoning, (16 more...)

arXiv.org Artificial Intelligence

2502.17807

Country:

Asia > China > Tianjin Province > Tianjin (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Robust Learning via Conditional Prevalence Adjustment

Nguyen, Minh, Wang, Alan Q., Kim, Heejong, Sabuncu, Mert R.

arXiv.org Artificial IntelligenceOct-24-2023

Healthcare data often come from multiple sites in which the correlations between confounding variables can vary widely. If deep learning models exploit these unstable correlations, they might fail catastrophically in unseen sites. Although many methods have been proposed to tackle unstable correlations, each has its limitations. For example, adversarial training forces models to completely ignore unstable correlations, but doing so may lead to poor predictive performance. Other methods (e.g. Invariant risk minimization [4]) try to learn domain-invariant representations that rely only on stable associations by assuming a causal data-generating process (input X causes class label Y ). Thus, they may be ineffective for anti-causal tasks (Y causes X), which are common in computer vision. We propose a method called CoPA (Conditional Prevalence-Adjustment) for anti-causal tasks. CoPA assumes that (1) generation mechanism is stable, i.e. label Y and confounding variable(s) Z generate X, and (2) the unstable conditional prevalence in each site E fully accounts for the unstable correlations between X and Y . Our crucial observation is that confounding variables are routinely recorded in healthcare settings and the prevalence can be readily estimated, for example, from a set of (Y, Z) samples (no need for corresponding samples of X). CoPA can work even if there is a single training site, a scenario which is often overlooked by existing methods. Our experiments on synthetic and real data show CoPA beating competitive baselines.

proceedings, test site, training site, (17 more...)

arXiv.org Artificial Intelligence

2310.15766

Country:

Oceania > Australia > Queensland (0.04)
Europe > Austria > Vienna (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Minecraft Used as Training Site for New AI

#artificialintelligenceSep-2-2019, 16:51:19 GMT

Minecraft is one of the most popular games of our time. Known for its block-based aesthetics and almost limitless exploration, the game allows people to do almost whatever they want. Because of the endless possibilities afforded by the game, Minecraft has been central to an artificial intelligence (AI) research group. Reported by the MIT Technology Review, researchers at Facebook have been using Minecraft as their training system for their new AI. While most contemporary successful AI systems are designed to excel at one task, these researchers are looking to create AI that can do multiple things well as it assists a player in their Minecraft adventure.

artificial intelligence, minecraft, training site, (3 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Games > Computer Games (1.00)

Add feedback